9 research outputs found

    Bayesian inference in molecular phylogeography using Markov chain Monte Carlo

    Get PDF
    The Bayesian approach to phylogenetic inference allows quantification of all aspects of uncertainty using probability. Markov chain Monte Carlo (MCMC), a class of algorithms based on iterative simulation, is often considered a gold standard for approximate Bayesian inference. However, MCMC is computationally intensive and there are many design decisions to be made when using it in practice. We discuss few principles for designing simple and efficient MCMC algorithms. In particular, we propose several new proposal kernels for MCMC based on the idea of introducing negative correlations in the simulation draws. In many cases, these kernels can lead to efficiency >100%. Using practical examples, we illustrate that a sequence of well-designed one-dimensional proposals can be more efficient than a single d-dimensional proposal, and that variable transformations can be used as a general strategy for designing efficient MCMC. Next, we turn to the problem of species tree inference in the Anopheles gambiae species complex from whole-genome data. This is a challenging problem due to complex effects of recent and rapid radiation, introgression, chromosome inversions and natural selection. We extract over 80,000 coding and noncoding loci from the genomes of six members of this species complex and perform Bayesian inference using MCMC under the multispecies coalescent model, which takes into account genealogical heterogeneity across the genome and uncertainty in the gene trees. We obtain a robust species tree estimate, consistent with chromosome inversions. Using simulation informed by the real data, we conclude that species trees from previous studies are erroneous as a result of methodological artefacts. We also found evidence of gene flow between certain pairs of species based on direct estimation of migration rates under the isolation-with-migration model. The results highlight the importance of accommodating incomplete lineage sorting and introgression in phylogenomic analyses of species that arose through recent radiative speciation events

    Full‐likelihood genomic analysis clarifies a complex history 2 of species divergence and introgression: the example of the 3 erato‐sara group of Heliconius butterflies

    Get PDF
    Introgressive hybridization plays a key role in adaptive evolution and species diversification in many groups of species. However, frequent hybridization and gene flow between species make estimation of the species phylogeny and key population parameters challenging. Here, we show that by accounting for phasing and using full-likelihood methods, introgression histories and population parameters can be estimated reliably from whole-genome sequence data. We employ the multispecies coalescent (MSC) model with and without gene flow to infer the species phylogeny and cross-species introgression events using genomic data from six members of the erato-sara clade of Heliconius butterflies. The methods naturally accommodate random fluctuations in genealogical history across the genome due to deep coalescence. To avoid heterozygote phasing errors in haploid sequences commonly produced by genome assembly methods, we process and compile unphased diploid sequence alignments and use analytical methods to average over uncertainties in heterozygote phase resolution. There is robust evidence for introgression across the genome, both among distantly related species deep in the phylogeny and between sister species in shallow parts of the tree. We obtain chromosome-specific estimates of key population parameters such as introgression directions, times and probabilities, as well as species divergence times and population sizes for modern and ancestral species. We confirm ancestral gene flow between the sara clade and an ancestral population of Heliconius telesiphe, a likely hybrid speciation origin for Heliconius hecalesia, and gene flow between the sister species Heliconius erato and Heliconius himera. Inferred introgression among ancestral species also explains the history of two chromosomal inversions deep in the phylogeny of the group. This study illustrates how a full-likelihood approach based on the MSC makes it possible to extract rich historical information of species divergence and gene flow from genomic data. [3S; BPP; gene flow; Heliconius; hybrid speciation; introgression; inversion; multispecies coalescent

    Inferring the Direction of Introgression Using Genomic Sequence Data

    Get PDF
    Genomic data are informative about the history of species divergence and interspecific gene flow, including the direction, timing, and strength of gene flow. However, gene flow in opposite directions generates similar patterns in multilocus sequence data, such as reduced sequence divergence between the hybridizing species. As a result, inference of the direction of gene flow is challenging. Here we investigate the information about the direction of gene flow present in genomic sequence data using likelihood-based methods under the multispecies-coalescentwith-introgression model. We analyze the case of two species, and use simulation to examine cases with three or four species. We find that it is easier to infer gene flow from a small population to a large one than in the opposite direction, and easier to infer inflow (gene flow from outgroup species to an ingroup species) than outflow (gene flow from an ingroup species to an outgroup species). It is also easier to infer gene flow if there is a longer time of separate evolution between the initial divergence and subsequent introgression. When introgression is assumed to occur in the wrong direction, the time of introgression tends to be correctly estimated and the Bayesian test of gene flow is often significant, while estimates of introgression probability can be even greater than the true probability. We analyze genomic sequences from Heliconius butterflies to demonstrate that typical genomic datasets are informative about the direction of interspecific gene flow, as well as its timing and strength

    Uncertainty in the Timing of Origin of Animals and the Limits of Precision in Molecular Timescales

    Get PDF
    The timing of divergences among metazoan lineages is integral to understanding the processes of animal evolution, placing the biological events of species divergences into the correct geological timeframe. Recent fossil discoveries and molecular clock dating studies have suggested a divergence of bilaterian phyla >100 million years before the Cambrian, when the first definite crown-bilaterian fossils occur. Most previous molecular clock dating studies, however, have suffered from limited data and biases in methodologies, and virtually all have failed to acknowledge the large uncertainties associated with the fossil record of early animals, leading to inconsistent estimates among studies. Here we use an unprecedented amount of molecular data, combined with four fossil calibration strategies (reflecting disparate and controversial interpretations of the metazoan fossil record) to obtain Bayesian estimates of metazoan divergence times. Our results indicate that the uncertain nature of ancient fossils and violations of the molecular clock impose a limit on the precision that can be achieved in estimates of ancient molecular timescales. For example, although we can assert that crown Metazoa originated during the Cryogenian (with most crown-bilaterian phyla diversifying during the Ediacaran), it is not possible with current data to pinpoint the divergence events with sufficient accuracy to test for correlations between geological and biological events in the history of animals. Although a Cryogenian origin of crown Metazoa agrees with current geological interpretations, the divergence dates of the bilaterians remain controversial. Thus, attempts to build evolutionary narratives of early animal evolution based on molecular clock timescales appear to be premature

    A novel Ancestral Beijing sublineage of Mycobacterium tuberculosis suggests the transition site to Modern Beijing sublineages.

    Get PDF
    Global Mycobacterium tuberculosis population comprises 7 major lineages. The Beijing strains, particularly the ones classified as Modern groups, have been found worldwide, frequently associated with drug resistance, younger ages, outbreaks and appear to be expanding. Here, we report analysis of whole genome sequences of 1170 M. tuberculosis isolates together with their patient profiles. Our samples belonged to Lineage 1-4 (L1-L4) with those of L1 and L2 being equally dominant. Phylogenetic analysis revealed several new or rare sublineages. Differential associations between sublineages of M. tuberculosis and patient profiles, including ages, ethnicity, HIV (human immunodeficiency virus) infection and drug resistance were demonstrated. The Ancestral Beijing strains and some sublineages of L4 were associated with ethnic minorities while L1 was more common in Thais. L2.2.1.Ancestral 4 surprisingly had a mutation that is typical of the Modern Beijing sublineages and was common in Akha and Lahu tribes who have migrated from Southern China in the last century. This may indicate that the evolutionary transition from the Ancestral to Modern Beijing sublineages might be gradual and occur in Southern China, where the presence of multiple ethnic groups might have allowed for the circulations of various co-evolving sublineages which ultimately lead to the emergence of the Modern Beijing strains

    Major patterns in the introgression history of Heliconius butterflies

    No full text
    Gene flow between species, although usually deleterious, is an important evolutionary process that can facilitate adaptation and lead to species diversification. It also makes estimation of species relationships difficult. Here, we use the full-likelihood multispecies coalescent (MSC) approach to estimate species phylogeny and major introgression events in Heliconius butterflies from whole-genome sequence data. We obtain a robust estimate of species branching order among major clades in the genus, including the ‘melpomene-silvaniform’ group, which shows extensive historical and ongoing gene flow. We obtain chromosome-level estimates of key parameters in the species phylogeny, including species divergence times, present-day and ancestral population sizes, as well as the direction, timing, and intensity of gene flow. Our analysis leads to a phylogeny with introgression events that differ from those obtained in previous studies. We find that Heliconius aoede most likely represents the earliest-branching lineage of the genus and that ‘silvaniform’ species are paraphyletic within the melpomene-silvaniform group. Our phylogeny provides new, parsimonious histories for the origins of key traits in Heliconius, including pollen feeding and an inversion involved in wing pattern mimicry. Our results demonstrate the power and feasibility of the full-likelihood MSC approach for estimating species phylogeny and key population parameters despite extensive gene flow. The methods used here should be useful for analysis of other difficult species groups with high rates of introgression
    corecore